Polygenic risk score-based phenome-wide association study of head and neck cancer across two large biobanks

Background Numerous observational studies have highlighted associations of genetic predisposition of head and neck squamous cell carcinoma (HNSCC) with diverse risk factors, but these findings are constrained by design limitations of observational studies. In this study, we utilized a phenome-wide association study (PheWAS) approach, incorporating a polygenic risk score (PRS) derived from a wide array of genomic variants, to systematically investigate phenotypes associated with genetic predisposition to HNSCC. Furthermore, we validated our findings across heterogeneous cohorts, enhancing the robustness and generalizability of our results. Methods We derived PRSs for HNSCC and its subgroups, oropharyngeal cancer and oral cancer, using large-scale genome-wide association study summary statistics from the Genetic Associations and Mechanisms in Oncology Network. We conducted a comprehensive investigation, leveraging genotyping data and electronic health records from 308,492 individuals in the UK Biobank and 38,401 individuals in the Penn Medicine Biobank (PMBB), and subsequently performed PheWAS to elucidate the associations between PRS and a wide spectrum of phenotypes. Results We revealed the HNSCC PRS showed significant association with phenotypes related to tobacco use disorder (OR, 1.06; 95% CI, 1.05–1.08; P = 3.50 × 10−15), alcoholism (OR, 1.06; 95% CI, 1.04–1.09; P = 6.14 × 10-9), alcohol-related disorders (OR, 1.08; 95% CI, 1.05–1.11; P = 1.09 × 10−8), emphysema (OR, 1.11; 95% CI, 1.06–1.16; P = 5.48 × 10−6), chronic airway obstruction (OR, 1.05; 95% CI, 1.03–1.07; P = 2.64 × 10−5), and cancer of bronchus (OR, 1.08; 95% CI, 1.04–1.13; P = 4.68 × 10−5). These findings were replicated in the PMBB cohort, and sensitivity analyses, including the exclusion of HNSCC cases and the major histocompatibility complex locus, confirmed the robustness of these associations. Additionally, we identified significant associations between HNSCC PRS and lifestyle factors related to smoking and alcohol consumption. Conclusions The study demonstrated the potential of PRS-based PheWAS in revealing associations between genetic risk factors for HNSCC and various phenotypic traits. The findings emphasized the importance of considering genetic susceptibility in understanding HNSCC and highlighted shared genetic bases between HNSCC and other health conditions and lifestyles. Supplementary Information The online version contains supplementary material available at 10.1186/s12916-024-03305-2.

Table S4.Odds ratio for HNSCC and its subtypes associated with genetic risk across subgroups by age, sex, and smoking status in the UK Biobank.
Table S5.Odds ratio for HNSCC and its subtypes associated with genetic risk in the Penn Medicine Biobank.
Table S6.Odds ratio for HNSCC associated with genetic risk across different case-control ratios in the UK Biobank and Penn Medicine Biobank.
Table S7.The ancestry-specific odds ratio for HNSCC associated with genetic risk in the Penn Medicine Biobank.Table S8.Full results of HNSCC PRS-PheWAS in UK Biobank and Penn Medicine Biobank.
Table S9.Full results of OPC PRS-PheWAS in UK Biobank and Penn Medicine Biobank.

ICD-9 codes:
Union of Oropharynx and Oral cavity.

ICD-10 codes:
Union of Oropharynx and Oral cavity.

ICD-10 codes:
Oral cavity (C02.0-C02.9,C03.0-C03.9,C04.0-C04.9 and C05.0-C06.9).HapMap3 [20].Then, we performed a kernel density estimator (KDE) algorithm on all samples to determine their genetically informed ancestry.We trained a KDE using the HapMap3 PCs and used the KDEs to calculate the likelihood of a given sample belonging to each of the five continental ancestry groups.Samples were excluded from analysis if no ancestry likelihoods were greater than 0.3, or if more than three ancestry likelihoods were greater than 0.3.After exclusion, a total of 27,933 individuals considered European (non-Hispanic White) ancestry and 10,468 individuals considered African American (non-Hispanic Black) ancestry were determined eligible for the replication analyses.
Method S4.Generation of polygenic risk scores.
We constructed PRSs for HNSCC, OPC, and OC by using a Bayesian polygenic prediction method, PRS-CS [22], which infers the posterior mean effect size of each variant using the linkage  ).The proportion of variance explained for PRS alone was computed as Nagelkerke's pseudo-R2.

Figure S2 .
Figure S2.Prevalence plot for significant phenotypes in PheWAS according to genetic risk groups.
disequilibrium (LD) reference panel and GWAS summary.The 1000G Project phase 3 EUR data was used to be the external LD reference panel.The posterior SNP effect sizes in PRS-CS were inferred from GAME-ON summary statistics, with default settings, and automatic estimation of the global shrinkage parameter (PRS-CS-auto).The individual PRSs were computed from beta coefficients as the weighted sum of the risk alleles by applying PLINK version 1.90 with thescore command [23].The detailed number of SNPs used in the analysis is depicted as follows (Table

Figure S2 .
Figure S2.Prevalence plot for significant phenotypes in PheWAS according to genetic risk groups.
Detailed information on the genotype data quality control and imputation procedures.
Abbreviations: GAME-ON, Genetic Associations and Mechanisms in Oncology; ICD, International Statistical Classification of Diseases and Related Health Problems; HNSCC, head and neck squamous cell carcinoma; OC, oral cavity cancer; OPC, oropharynx cancer.Method S3.Sample-level QC was performed by excluding samples on the basis of (i) mismatched sex or (ii) having second-degree or closer relatives also in the Biobank.We inferred ancestry by projecting array genotype data onto PC axes defined by individuals from the (HNSCC [5,974 cases and 4,012 controls], OPC [2,617 cases and 4,012 controls], and OC [2,958 cases and 4,012 controls]).The GWASs were performed using PLINK 1.90 with sex, age, 10 PCs, and genotyping batch as covariates.The genotype data for the oral and pharyngeal OncoArray study can be downloaded from the database of Genotypes and Phenotypes (dbGaP) under accession phs001202.v1.p1.Of note, the GWASs did not include the additional external controls (2,476 shared controls [1,453 from the EPIC study and 1,023 from the Toronto study]) beyond the GAME-ON data used by

Table .
Number of used SNPs in generating PRSs.Number of missing data for each variable in the UK Biobank.

Table S1 .
Characteristics of participants in the UK Biobank.
* P-value indicates the significance of the difference between the control and HNSCC case groups.Abbreviations: HPV, Human papillomavirus; HNSCC, head and neck squamous cell carcinoma; OC, oral cavity cancer; OPC, oropharynx cancer; SD, standard deviation.

Table S2 .
Characteristics of participants in the Penn Medicine Biobank.
* P-value indicates the significance of the difference between the control and HNSCC case groups.Abbreviations: HNSCC, head and neck squamous cell carcinoma; OC, oral cavity cancer; OPC, oropharynx cancer; SD, standard deviation.

Table S3 .
Odds ratio for HNSCC and its subtypes associated with genetic risk in the UK Biobank.
All analyses were adjusted by age, sex, genotype array, and PC 1 to 10.*The proportion of variance explained for PRS alone was computed as Nagelkerke's pseudo-R2.Abbreviations: HNSCC, head and neck squamous cell carcinoma; OC, oral cavity cancer; OPC, oropharynx cancer; PRS, polygenic risk score; SD, standard deviation; OR, Odds ratio; CI, confidence interval; PC, principal component.

Table S4 .
Odds ratio for HNSCC and its subtypes associated with genetic risk across subgroups by age, sex, and smoking status in the

Table S5 .
Odds ratio for HNSCC and its subtypes associated with genetic risk in the Penn Medicine Biobank.
*The proportion of variance explained for PRS alone was computed as Nagelkerke's pseudo-R2.Abbreviations: HNSCC, head and neck squamous cell carcinoma; OC, oral cavity cancer; OPC, oropharynx cancer; PRS, polygenic risk score; SD, standard deviation; OR, Odds ratio; CI, confidence interval; PC, principal component.

Table S6 .
Odds ratio for HNSCC associated with genetic risk across different case-control ratios in the UK Biobank and Penn Medicine Biobank.
1The UK Biobank analyses were adjusted by age, sex, genotype array, and PC 1 to 10.2The Penn Medicine Biobank analyses were adjusted by age, sex, ethnicity, and PC 1 to 10. * Controls were extracted from samples matched for age and sex with cases for each ratio using the "matchIt" R package.**

Table S7 .
The ancestry-specific odds ratio for HNSCC associated with genetic risk in the Penn Medicine Biobank.
*The proportion of variance explained for PRS alone was computed as Nagelkerke's pseudo-R2.Abbreviations: PMBB, Penn Medicine Biobank; HNSCC, head and neck squamous cell carcinoma; SD, standard deviation; OR, Odds ratio; CI, confidence interval; PC, principal component.

Table S8 .
Full results of HNSCC PRS-PheWAS in UK Biobank and Penn Medicine Biobank.

Table S9 .
Full results of OPC PRS-PheWAS in UK Biobank and Penn Medicine Biobank.

Table S10 .
Full results of OC PRS-PheWAS in UK Biobank and Penn Medicine Biobank.*Tables S8-10 are provided in Additional file 2 (as an Excel file).